Extracting tree fragments in linear average time
نویسنده
چکیده
This report details the implementation of a fragment extraction algorithm using an average case linear time tree kernel. Given a treebank, the algorithm extracts all fragments that occur at least twice, along with their frequency. Evaluation shows a -fold speedup over a quadratic fragment extraction implementation. Additionally, we add support for trees with discontinuous constituents.
منابع مشابه
A method for analyzing the problem of determining the maximum common fragments of temporal directed tree, that do not change with time
In this study two actual types of problems are considered and solved: 1) determining the maximum common connected fragment of the T-tree (T-directed tree) which does not change with time; 2) determining all maximum common connected fragments of the T-tree (T-directed tree) which do not change with time. The choice of the primary study of temporal directed trees and trees is justified by the wid...
متن کاملDiscontinuous Parsing with an Efficient and Accurate DOP Model
We present a discontinuous variant of treesubstitution grammar (tsg) based on Linear Context-Free Rewriting Systems. We use this formalism to instantiate a Data-Oriented Parsing model applied to discontinuous treebank parsing, and obtain a significant improvement over earlier results for this task. The model induces a tsg from the treebank by extracting fragments that occur at least twice. We g...
متن کاملИзвлечение низкочастотных терминов из специализированных текстов (Extraction of Low-Frequent Terms from Domain-Specific Texts)
We examined a method for extracting the low frequency important single-word terms from domain specific text. Firstly, domain-relevant fragments were extracted from the text with the help of a dependency tree. Then the fragments were clustered and candidate terms were defined using the semantic classifier. The studies suggest that this approach allows extracting even terms with a single occurrence.
متن کاملLZ77 Factorisation of Trees
We generalise the fundamental concept of LZ77 factorisation from strings to trees. A tree is represented as a collection of edge-disjoint fragments that either consist of one node or has already occurred earlier (in the BFS order). Similarly as for strings, such a collection uniquely determines the tree, so by minimising the number of fragments we obtain a compressed representation of the tree....
متن کاملAn improved algorithm to reconstruct a binary tree from its inorder and postorder traversals
It is well-known that, given inorder traversal along with one of the preorder or postorder traversals of a binary tree, the tree can be determined uniquely. Several algorithms have been proposed to reconstruct a binary tree from its inorder and preorder traversals. There is one study to reconstruct a binary tree from its inorder and postorder traversals, and this algorithm takes running time of...
متن کامل